Update model benchmarks with latest Arena Elo scores by insign · Pull Request #304 · verseles/showdown

insign · 2026-04-21T17:15:36Z

Updates LMArena Elo scores for the latest models in data/showdown.json.

Specifically, this PR updates scores for:

claude-sonnet-4-6-20260217
gpt-5.1 and gpt-5.1-high
gpt-5.2 and gpt-5.2-high
gpt-5.4-high
deepseek-v3.2 and deepseek-v3.2-thinking

Values were verified through recent queries of the Arena leaderboard. All fields without a high-confidence update were preserved to maintain dataset integrity. Also updated the meta.last_update timestamp. All validation checks have passed.

PR created automatically by Jules for task 11938924724193172488 started by @insign

Updated `lmarena_*_elo` values for several recent models: - Claude Sonnet 4.6 - GPT-5.1, GPT-5.1 High, GPT-5.2, GPT-5.2 High, GPT-5.4 High - DeepSeek V3.2, DeepSeek V3.2 Thinking Data pulled from `arena.ai` via direct API proxies and cross-referenced with internal tools. Unverified fields were left untouched to preserve dataset integrity. Co-authored-by: insign <1113045+insign@users.noreply.github.com>

google-labs-jules · 2026-04-21T17:15:38Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

chatgpt-codex-connector · 2026-04-21T17:15:41Z

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

cloudflare-workers-and-pages · 2026-04-21T17:18:30Z

Deploying showd with Cloudflare Pages

Latest commit:	`c8cdf27`
Status:	✅ Deploy successful!
Preview URL:	https://dd9bc94f.showd.pages.dev
Branch Preview URL:	https://feature-update-models-161539-2r7j.showd.pages.dev

View logs

insign merged commit caa5f02 into main Apr 21, 2026
3 checks passed

insign deleted the feature/update-models-16153910549189824861-11938924724193172488 branch April 21, 2026 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update model benchmarks with latest Arena Elo scores#304

Update model benchmarks with latest Arena Elo scores#304
insign merged 1 commit into
mainfrom
feature/update-models-16153910549189824861-11938924724193172488

insign commented Apr 21, 2026

Uh oh!

google-labs-jules Bot commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

insign commented Apr 21, 2026

Uh oh!

google-labs-jules Bot commented Apr 21, 2026

Uh oh!

chatgpt-codex-connector Bot commented Apr 21, 2026

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 21, 2026

Deploying showd with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant